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Recently researchers have proposed modeling register allocation as an integer linear 
programming (IP) problem and solving it optimally for general purpose processors [17, 20] 
and for dedicated embedded systems [23]. Compared with traditional graph-coloring 
approaches, the IP-based allocators can improve a program's performance. However, the 
solution times are much slower.This paper presents an IP-based optimal register allocator 
which is much faster than previous work. We present several local a ... 

12 Automating commutativity analysis at the design level 
Greg Dennis, Robert Seater, Derek Rayside, Daniel Jackson 

July 2004 ACM SIGSOFT Software Engineering Notes , Proceedings of the 2004 ACM 
SIGSOFT international symposium on Software testing and analysis, volume 

29 Issue 4 

Full text available: ^DdfM29.59 KB) Additional Information: full citation, abstract, references, index terms 

Two operations commute if executing them serially in either order results in the same 
change of state. In a system in which commands may be issued simultaneously by different 
users, lack of commutativity can result in unpredictable behaviour, even if the commands 
are serialized, because one user's command may be preempted by another's, and thus 
executed in an unanticipated state. This paper describes an automated approach to 
analyzing commutativity. The operations are expressed as constraints in ... 

Keywords: OCL, alloy, case study, commutativity, concurrency, critical systems, formal 
specification, lightweight formal methods, model checking, proton therapy, radiation 
therapy, testing 
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Compiling for distributed-memory machines has been a very active research area in recent 
years. Much of this work has concentrated on programs that use arrays as their primary 
data structures. To date, little work has been done to address the problem of supporting 
programs that use pointer-based dynamic data structures. The techniques developed for 
supporting SPMD execution of array-based programs rely on the fact that arrays are 
statically defined and directly addressable. Recursive data s ... 

Keywords: dynamic data structures 
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June 1997 ACM SIGPLAN Notices , Proceedings of the sixth ACM SIGPLAN symposium 
on Principles and practice of parallel programming, volume 32 issue i 
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Many of today's high level parallel languages support dynamic, fine-grained parallelism. 
These languages allow the user to expose all the parallelism in the program, which is 
typically of a much higher degree than the number of processors. Hence an efficient 
scheduling algorithm is required to assign computations to processors at runtime. Besides 
having low overheads and good load balancing, it is important for the scheduling algorithm 
to minimize the space usage of the parallel program. This pa ... 

Keywords: dynamic scheduling, language implementation, multithreading, nested 
parallelism, space efficiency 
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With the advent of real-time and goal-oriented database systems, priority scheduling is 
likely to be an important feature in future database management systems. A consequence of 
priority scheduling is that a transaction may lose its buffers to higher-priority transactions, 
and may be given additional memory when transactions leave the system. Due to their 
heavy reliance on main memory, hash joins are especially vulnerable to fluctuations in 
memory availability. Previous studies have propose ... 
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We discuss the issues about the interdependence between code scheduling and register 
allocation. We present two methods as solutions: (1) an integrated code scheduling 
technique; and (2) a DAG-driven register allocator. The integrated code scheduling method 
combines two scheduling techniques— one to reduce pipeline delays and the other to 
minimize register usage— into a single phase. By keeping track of the number of available 
registers, the scheduler can choose the appropriate sc ... 
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Volume 22 Issue 1 
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Modern compilers often implement function calls (or returns) in two steps: first, a "closure" 
environment is properly installed to provide access for free variables in the target program 
fragment; second, the control is transferred to the target by a "jump with arguments (for 
results)/' Closure conversion— which decides where and how to represent closures at 
runtime— is a crucial step in the compilation of functional languages. This paper presents a 
new alg ... 

Keywords: callee-save registers, closure conversion, closure representation, compiler 
optimization, flow analysis, heap-based compilation, space safety 
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Solving problems of large sizes is an important goal for parallel machines with multiple CPU 
and memory resources. In this paper, issues of efficient execution of overhead-sensitive 
parallel irregular computation under memory constraints are addressed. The irregular 
parallelism is modeled by task dependence graphs with mixed granularities. The trade-off in 
achieving both time and space efficiency is investigated. The main difficulty of designing 
efficient run-time system support is caused by the ... 

20 Query evaluation techniques for large databases 
Goetz Graefe 

June 1993 ACM Computing Surveys (CSUR), Volume 25 issue 2 
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Database management systems will continue to manage large data volumes. Thus, efficient 
algorithms for accessing and manipulating large sets and sequences will be required to 
provide acceptable performance. The advent of object-oriented and extensible database 
systems will not solve this problem. On the contrary, modern data models exacerbate the 
problem: In order to manipulate large sets of complex objects as efficiently as today's 
database systems manipulate simple records, query-processi ... 

Keywords: complex query evaluation plans, dynamic query evaluation plans, extensible 
database systems, iterators, object-oriented database systems, operator model of 
parallelization, parallel algorithms, relational database systems, set-matching algorithms, 
sort-hash duality 
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Many high-level parallel programming languages allow for fine-grained parallelism. As in the popular 
work-time framework for parallel algorithm design, programs written in such languages can express 
the full parallelism in the program without specifying the mapping of program tasks to processors. A 
common concern in executing such programs is to schedule tasks to processors dynamically so as to 
minimize not only the execution time, but also the amount of space (memory) needed. Without 
caref ... 
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Integrating object-oriented programming and protected objects in Ada 95 
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Integrating concurrent and object-oriented programming has been an active research topic since the 
late 1980's. There is a now a plethora of methods for achieving this integration. The majority of 
approaches have taken a sequential object-oriented language and made it concurrent. A few 
approaches have taken a concurrent language and made it object-oriented. The most important of 
this latter class is the Ada 95 language, which is an extension to the object-based concurrent 
programming langua ... 

Keywords: Ada 95, concurrency, concurrent object-oriented programming, inheritance anomaly 
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Many of today's high-level parallel languages support dynamic, fine-grained parallelism. These 
languages allow the user to expose all the parallelism in the program, which is typically of a much 
higher degree than the number of processors. Hence an efficient scheduling algorithm is required to 
assign computations to processors at runtime. Besides having low overheads and good load 
balancing, it is important for the scheduling algorithm to minimize the space usage of the parallel 
program. T ... 

Keywords: dynamic scheduling, multithreading, nested parallelism, parallel language 
implementation, space efficiency 
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Integrating concurrent and object-oriented programming has been an active research topic since the 
late 1980's. There is now a plethora of methods for achieving this integration. The majority of 
approaches have taken a sequential object-oriented language and made it concurrent. A few 
approaches have taken a concurrent language and made it object-oriented. The most important of 
this latter class is the Ada 95 language, which is an extension to the object-based concurrent 
programming language Ada ... 

Keywords: Ada 95, concurrency, concurrent object-oriented programming, inheritance anomaly 
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In this article we investigate the trade-off between time and space efficiency in scheduling and 
executing parallel irregular computations on distributed-memory machines. We employ acyclic task 
dependence graphs to model irregular parallelism with mixed granularity, and we use direct remote 
memory access to support fast communication. We propose new scheduling techniques and a run- 
time active memory management scheme to improve memory utilization while retaining good time 
efficiency, and we ... 

Keywords: DAG scheduling, direct remote memory access, irregular parallelism, run-time support 
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We describe an algorithm that efficiently implements the first-fit strategy for dynamic storage 
allocation. The algorithm imposes a storage overhead of only one word per allocated block (plus a 
few percent of the total space used for dynamic storage), and the time required to allocate or free a 
block is 0(log W), where W is the maximum number of words allocated dynamically. The algorithm is 
faster than many commonly used algorithms, especia ... 
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Recently researchers have proposed modeling register allocation as an integer linear programming 
(IP) problem and solving it optimally for general purpose processors [17, 20] and for dedicated 
embedded systems [23]. Compared with traditional graph-coloring approaches, the IP-based 
allocators can improve a program's performance. However, the solution times are much slower.This 
paper presents an IP-based optimal register allocator which is much faster than previous work. We 
present several local a ... 
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Two operations commute if executing them serially in either order results in the same change of 
state. In a system in which commands may be issued simultaneously by different users, lack of 
commutativity can result in unpredictable behaviour, even if the commands are serialized, because 
one user's command may be preempted by another's, and thus executed in an unanticipated state. 
This paper describes an automated approach to analyzing commutativity. The operations are 
expressed as constraints in ... 

Keywords: OCL, alloy, case study, commutativity, concurrency, critical systems, formal 
specification, lightweight formal methods, model checking, proton therapy, radiation therapy, testing 
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Compiling for distributed-memory machines has been a very active research area in recent years. 
Much of this work has concentrated on programs that use arrays as their primary data structures. To 
date, little work has been done to address the problem of supporting programs that use pointer- 
based dynamic data structures. The techniques developed for supporting SPMD execution of array- 
based programs rely on the fact that arrays are statically defined and directly addressable. Recursive 
data s ... 
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Many of today's high level parallel languages support dynamic, fine-grained parallelism. These 
languages allow the user to expose all the parallelism in the program, which is typically of a much 
higher degree than the number of processors. Hence an efficient scheduling algorithm is required to 
assign computations to processors at runtime. Besides having low overheads and good load 
balancing, it is important for the scheduling algorithm to minimize the space usage of the parallel 
program. This pa ... 

Keywords: dynamic scheduling, language implementation, multithreading, nested parallelism, space 
efficiency 
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With the advent of real-time and goal-oriented database systems, priority scheduling is likely to be 
an important feature in future database management systems. A consequence of priority scheduling 
is that a transaction may lose its buffers to higher-priority transactions, and may be given additional 
memory when transactions leave the system. Due to their heavy reliance on main memory, hash 
joins are especially vulnerable to fluctuations in memory availability. Previous studies have 
propose ... 
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We discuss the issues about the interdependency between code scheduling and register allocation. 
We present two methods as solutions: (1) an integrated code scheduling technique; and (2) a DAG- 
driven register allocator. The integrated code scheduling method combines two scheduling 
techniques— one to reduce pipeline delays and the other to minimize register usage— into a single 
phase. By keeping track of the number of available registers, the scheduler can choose the 
appropriate sc ... 
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Modern compilers often implement function calls (or returns) in two steps: first, a "closure" 
environment is properly installed to provide access for free variables in the target program fragment; 
second, the control is transferred to the target by a "jump with arguments (for results)." Closure 
conversion— which decides where and how to represent closures at runtime— is a crucial step in the 
compilation of functional languages. This paper presents a new alg ... 

Keywords: callee-save registers, closure conversion, closure representation, compiler optimization, 
flow analysis, heap-based compilation, space safety 
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Solving problems of large sizes is an important goal for parallel machines with multiple CPU and 
memory resources. In this paper, issues of efficient execution of overhead-sensitive parallel irregular 
computation under memory constraints are addressed. The irregular parallelism is modeled by task 
dependence graphs with mixed granularities. The trade-off in achieving both time and space 
efficiency is investigated. The main difficulty of designing efficient run-time system support is caused 
by the ... . 
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Database management systems will continue to manage large data volumes. Thus, efficient 
algorithms for accessing and manipulating large sets and sequences will be required to provide 
acceptable performance. The advent of object-oriented and extensible database systems will not 
solve this problem. On the contrary, modern data models exacerbate the problem: In order to 
manipulate large sets of complex objects as efficiently as today's database systems manipulate 
simple records, query- processi ... 

Keywords: complex query evaluation plans, dynamic query evaluation plans, extensible database 
systems, iterators, object-oriented database systems, operator model of parallelization, parallel 
algorithms, relational database systems, set-matching algorithms, sort-hash duality 
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Many high-level parallel programming languages allow for fine-grained parallelism. As in the popular 
work-time framework for parallel algorithm design, programs written in such languages can express 
the full parallelism in the program without specifying the mapping of program tasks to processors. A 
common concern in executing such programs is to schedule tasks to processors dynamically so as to 
minimize not only the execution time, but also the amount of space (memory) needed. Without 
caref ... 
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Integrating concurrent and object-oriented programming has been an active research topic since the 
late 1980's. There is a now a plethora of methods for achieving this integration. The majority of 
approaches have taken a sequential object-oriented language and made it concurrent. A few 
approaches have taken a concurrent language and made it object-oriented. The most important of 
this latter class is the Ada 95 language, which is an extension to the object-based concurrent 
programming langua ... 

Keywords: Ada 95, concurrency, concurrent object-oriented programming, inheritance anomaly 
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Many of today's high-level parallel languages support dynamic, fine-grained parallelism. These 
languages allow the user to expose all the parallelism in the program, which is typically of a much 
higher degree than the number of processors. Hence an efficient scheduling algorithm is required to 
assign computations to processors at runtime. Besides having low overheads and good load 
balancing, it is important for the scheduling algorithm to minimize the space usage of the parallel 
program. T ... 

Keywords: dynamic scheduling, multithreading, nested parallelism, parallel language 
implementation, space efficiency 
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Choosing the right kind of register for a live range plays a major role in eliminating the register- 
allocation overhead when the compiled function js frequently executed or function tails are on the 
most frequently executed paths. Picking the wrong kind of register for a live range incurs a high 
penalty that may dominate the total overhead of register allocation. In this paper, we present three 
improvements, storage-class analysis, benefit-driven simplification, and preference decision that 
are ... 
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Integrating concurrent and object-oriented programming has been an active research topic since the 
late 1980*s. There is now a plethora of methods for achieving this integration. The majority of 
approaches have taken a sequential object-oriented language and made it concurrent. A few 
approaches have taken a concurrent language and made it object-oriented. The most important of 
this latter class is the Ada 95 language, which is an extension to the object-based concurrent 
programming language Ada ... 

Keywords: Ada 95, concurrency, concurrent object-oriented programming, inheritance anomaly 



8 Efficient implementation of the first-fit strategy for dynamic storage ailocation 
R. P. Brent 

July 1989 ACM Transactions on Programming Languages and Systems (TOPLAS), Volume 11 
Issue 3 

Full text available: ■^g^KKCl.Q&idfi) Additional Information: MLGta&ffl. abS5Si&£& JB'sractCSS. inSS&SSKas, B&dflW 

We describe an algorithm that efficiently implements the first-fit strategy for dynamic storage 
allocation. The algorithm imposes a storage overhead of only one word per allocated block (plus a 
few percent of the total space used for dynamic storage), and the time required to allocate or free a 
block is 0(log W), where W is the maximum number of words allocated dynamically. The algorithm is 
faster than many commonly used algorithms, especia ... 

9 Comparing the reliability provided by tasks or protected objects for implementing a resource 
allocation service: a case study 

C. Kaiser, J. F. Pradat-Peyre 

November 1997 Proceedings of the conference on TRI-Ada '97 

Fu II text ava ilable: CdfO J53.MB) Additional I nformation: fuli.iiial-ac. .citing acieaiLSaniiSL 



Compiiation and run-time systems: A faster optimal register allocator 
Changqing Fu, Kent Wilken 

November 2002 Proceedings of the 35th annual ACM/IEEE international symposium on 
Microarchitecture 

Full text available: jjgjl 

^|j^Ka<?2.3/.J5.B).™ Additional Information: Mxttatian, aastiact jskcancics. citings. fc5ax.5£cm& 

Publisher Site 

Recently researchers have proposed modeling register allocation as an integer linear programming 
(IP) problem and solving it optimally for general purpose processors [17, 20] and for dedicated 
embedded systems [23], Compared with traditional graph-coloring approaches, the IP-based 
allocators can improve a program's performance. However, the solution times are much slower. This 
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paper presents an IP-based optimal register allocator which is much faster than previous work. We 
present several local a ... 

11 Scalable Sock-free dynamic memory aliocation 
Maged M. Michael 

June 2004 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 2004 conference on 
Programming language design and implementation, Volume 39 Issue 6 

Full text available: ^|pdtt213.34i\3) Additional Information: ■uli.iiiafcQG. aSsictti, xs&rsznoss, iBas:<iejn3& 

Dynamic memory allocators (malloc/free) rely on mutual exclusion locks for protecting the 
consistency of their shared data structures under multithreading. The use of locking has many 
disadvantages with respect to performance, availability, robustness, and programming flexibility. A 
lock-free memory allocator guarantees progress regardless of whether some threads are delayed or 
even killed and regardless of scheduling policies. This paper presents a completely lock-free memory 
allocator. It uses ... 

Keywords: async-signal-safe, availability, lock-free, malloc 



12 Automating commutativity anaivsis at the design level 
Greg Dennis, Robert Seater, Derek Rayside, Daniel Jackson 

July 2004 ACM SIGSOFT Software Engineering Notes , Proceedings of the 2004 ACM SIGSOFT 
international symposium on Software testing and analysis, Volume 29 Issue 4 

Full text available: ^jj^prftyias SSKR) Additional Information: fu» citation, jaastffi£t, references , in tfc* terms 

Two operations commute if executing them serially in either order results in the same change of 
state. In a system in which commands may be issued simultaneously by different users, lack of 
commutativity can result in unpredictable behaviour, even if the commands are serialized, because 
one user's command may be preempted by another's, and thus executed in an unanticipated state. 
This paper describes an automated approach to analyzing commutativity. The operations are 
expressed as constraints in ... 

Keywords: OCL, alloy, case study, commutativity, concurrency, critical systems, formal 
specification, lightweight formal methods, model checking, proton therapy, radiation therapy, testing 
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Perng-Ti Ma, T. G. Lewis 

April 1980 ACM Transactions on Programming Languages and Systems (TOPLAS), Volume 2 Issue 
2 

Full text available: pdff 1 17 MB) Additional Information: &jj eitar-on. abstract, refers ncas, citings , Snrift* terms 

Methods are described to translate a certain machine-independent intermediate language (IML) to 
efficient microprograms for a class of horizontal microprogrammable machines. The IML is compiled 
directly from a high-level microprogramming language used to implement a virtual instruction set 
processor as a microprogram. The primary objective of the IML-to-host machine interface design is to 
facilitate language portability. Transportability is accomplished by use of a field descript ... 

14 A structural v iew o f the Cedar programming environment 

Daniel C. Swinehart, Polle T. Zellweger, Richard J. Beach, Robert B. Hagmann 

August 1986 ACM Transactions on Programming Languages and Systems (TOPLAS), Volume 8 Issue 
4 

Full text available: pdf(6 32 MB) Additional Information: ftili drarior., abstract, references, cfiny s , \n&vx terms 

This paper presents an overview of the Cedar programming environment, focusing on its overall 
structure— that is, the major components of Cedar and the way they are organized. Cedar supports 
the development of programs written in a single programming language, also called Cedar. Its 
primary purpose is to increase the productivity of programmers whose activities include experimental 
programming and the development of prototype software systems for a high-performance personal 
computer. T ... 

15 Supporting dynamic data structures on distributed-memory machines 
Anne Rogers, Martin C. Carlisle, John H. Reppy, Laurie J. Hendren 

March 1995 ACM Transactions on Programming Languages and Systems (TOPLAS), Volume 17 
Issue 2 

Full text available: ^||pdfi2 05 MB) Additional Information: fn\\ gftaNon. abstract raferancas. cSfiny s. facia* terms, review 

Compiling for distributed-memory machines has been a very active research area in recent years. 
Much of this work has concentrated on programs that use arrays as their primary data structures. To 
date, little work has been done to address the problem of supporting programs that use pointer- 
based dynamic data structures. The techniques developed for supporting SPMD execution of array- 
based programs rely on the fact that arrays are statically defined and directly addressable. Recursive 
data s ... 
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Tao Yang, Cong Fu 

November 1998 ACM Transactions on Programming Languages and Systems (TOPLAS), Volume 20 
Issue 6 

Full text available: ^^j^t&4£&j&3). Additional Information: fiili.kG3liO.G- aiEiSSi, SS&ns&SS, C&oas, lC.c.QXiejro^ 

In this article we investigate the trade-off between time and space efficiency in scheduling and 
executing parallel irregular computations on distributed-memory machines. We employ acyclic task 
dependence graphs to model irregular parallelism with mixed granularity, and we use direct remote 
memory access to support fast communication. We propose new scheduling techniques and a run- 
time active memory management scheme to improve memory utilization while retaining good time 
efficiency, and we ... 

Keywords: DAG scheduling, direct remote memory access, irregular parallelism, run-time support 



Reorganizing global schedules for register allocation 
Gang Chen, Michael D. Smith 

May 1999 Proceedings of the 13th international conference on Supercomputing 

Full text available: 4§^|j)dKlJ£LMB) Additional Information: fuiJ.cjliitLOJ), ffifeE&SJEiS, £iftiSi$, iactex.taS335. 



Keywords: instruction-level parallelism, register allocation, superblock scheduling 



18 Beaj__tim_e^ 

John Reif, Paul Spirakis 

August 1982 Proceedings of the first ACM SIGACT-SIGOPS symposium on Principles of 
distributed computing 

Full text available: ^| pdf(8S2 77 Kftj Additional Information: full citato , ai&iauu, references, citing s , i*e?ax terms 



In this paper we consider a resource allocation problem which is local in the sense that the maximum 
number of users competing for a particular resource at any time instant is bounded and also at any 
time instant the maximum number of resources that a user is willing to get is bounded. The problem 
may be viewed as that of achieving matchings in dynamically changing hypergraphs, via a distributed 
algorithm. We show that this problem is related to the fundamental problem of < ... 

18 Fbufs: a high-bandwidth cross-domain transfer facility 
Peter Druschel, Larry L. Peterson 

December 1993 ACM SIGOPS Operating Systems Review , Proceedings of the fourteenth ACM 
symposium on Operating systems principles, Volume 27 Issue 5 

Full text available: ^| prifH 3ft MR) Additional Information: hiH citation , abstract , references, citing s, \r.rsax terms 

We have designed and implemented a new operating system facility for I/O buffer management and 
data transferacross protection domain boundaries on shared memory machines. This facility, called 
fast buffers (fbufs), combines virtual page remapping with shared virtual memory, and exploits 
locality in I/O traffic to achieve high throughput without compromising protection, security, or 
modularity, goal is to help deliver the high bandwidth afforded by emerging high-speed networks to 
user-leve ... 
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January 1998 Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms 
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1 Provably efficient scheduling for languages with fine-grained parallelism 
Guy E. Blelloch, Phillip B. Gibbons, Yossi Matias 
March 1999 Journal Of the ACM (JACM), Volume 46 Issue 2 

Full text available: ^^p^a^Si'lS) Additional Information: fuiliiiatioa, ais&sst, rctonass, ofcnsB, uicfixiejm;: 

Many high-level parallel programming languages allow for fine-grained parallelism. As in the popular 
work-time framework for parallel algorithm design, programs written in such languages can express 
the full parallelism in the program without specifying the mapping of program tasks to processors. A 
common concern in executing such programs is to schedule tasks to processors dynamically so as to 
minimize not only the execution time, but also the amount of space (memory) needed. Without 
caref ... 



2 Space-efficient scheduling of paraljejjsm with ^ synchronization 

Guy E. Blelloch, Phillip B. Gibbons, Girija J. Narlikar, Yossi Matias 

June 1997 Proceedings of the ninth annual ACM symposium on Parallel algorithms and 
architectures 

Full text available: ^S? |pdf(1 B? Mftj Additional Information: full citalicri , referents. ciHnfls. i;:ciftx terms 



3 A performance study of the cancelback protocol for Time Warp 
Samir R. Das, Richard M. Fujimoto 

July 1993 ACM 5IGSZM Simulation Digest , Proceedings of the seventh workshop on Parallel 
and distributed simulation, Volume 23 Issue l 

Full text available: ^^j3dfCL03.MB) Additional Information: fiilLajaiioa. jafcsirssi, xsisrenzsa. cjUags, isflssiejma 

This work presents results from an experimental evaluation of the space-time tradeoffs in Time Warp 
augmented with the cancelback protocol for memory management. An implementation of the 
cancelback protocol on Time Warp is described that executes on a shared memory multiprocessor, a 
32 processor Kendall Square Research Machine (KSR1). The implementation supports canceling back 
more than one object when memory has been exhausted. The limited memory performance of the 
system is evaluated for ... 

4 The priority-based coloring approach to register allocation 

Fred C, Chow, John L. Hennessy 

October 1990 ACM Transactions on Programming Languages and Systems (TOPLAS), Volume 12 
Issue 4 

Full text available: ^ptfa&T.MB) Additional Information: fuHfliaJ-QS. abstract, Jte&ffiflaes. Sitings, iacflxiSJmsi, iStfaMf 

Global register allocation plays a major role in determining the efficacy of an optimizing compiler. 
Graph coloring has been used as the central paradigm for register allocation in modern compilers. A 
straightforward coloring approach can suffer from several shortcomings. These shortcomings are 
addressed in this paper by coloring the graph using a priority ordering. A natural method for dealing 
with the spilling emerges from this approach. The detailed algorithms for a priority-based colori ... 

5 A dxnarnicpiocess 

Cathy McCann, Raj Vaswani, John Zahorjan 

May 1993 ACM Transactions on Computer Systems (TOCS), Volume 11 Issue 2 

Full text available: ^|pdt(12$LYiB) Additional Information: fulLiixalicyx afcsxosi, caierejifcas, ci-iass, lacex-ieim:- 

We propose and evaluate empirically the performance of a dynamic processor-scheduling policy for 
multiprogrammed shared-memory multiprocessors. The policy is dynamic in that it reallocates 
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processors from one parallel job to another based on the currently realized parallelism of those jobs. 
The policy is suitable for implementation in production systems in that: —It interacts well with very 
efficient user-level thread packages, leaving to them many low-level thr ... 

Keywords: shared memory parallel processors, threads, two-level scheduling 



6 A resource ma^^ for priority-based physicaj-memQ.ry.ailocation 
Kingsley Cheung, Gemot Heiser 

January 2002 Australian Computer Science Communications , Proceedings of the seventh Asia- 
Pacific conference on Computer systems architecture - Volume 6, Volume 24 Issue 3 

Full text available: ^||| prrTM 22 MB) Additional Information: ft.'ii eiarion , abstract, references, inctex terms 

Most multitasking operating systems support scheduling priorities in order to ensure that processor 
time is allocated to important or time-critical processes in preference to less important ones. Ideally 
this would prevent a low-priority process from slowing the execution of a high-priority one. In 
practice, strict prioritisation is undermined by a lack of suitable allocation policy for resources other 
than CPU time. For example, a low priority process may degrade the execution speed of a high-p ... 

7 Simultaneous reference allocation in code generation for dual data memory bank ASIPs 
Ashok Sudarsanam, Sharad Malik 

April 2000 ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 5 
Issue 2 



Full text available: 




Additional Information: fuli citation, abstract, references, cfrny s , ir.aax. terms 



We address the problem of code generation for DSP systems on a chip. In such systems, the amount 
of silicon devoted of program ROM is limited, so application software must be sufficiently dense. 
Additionally, the software must be written so as to meet various high-performance constraints, which 
may include hard real-time constraints. Unfortunately, current compiler technology is unable to 
generate high-quality code for DSPs, whose architectures are highly irregular. Thus, designers often 
r ... 

Keywords: code generation, code optimization, graph labelling, memory bank assignment, register 
allocation 
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External memory algorithms and data structures: dealing with massive 

aia 

Jeffrey Scott Vitter 

June 2001 ACM Computing Surveys (CSUR), Volume 33 Issue 2 

Full text available: ^|ptf(a2&46i?l3) Additional Information: fulUljafcQC afcs&jct, XBfelBJ3SS& ofcass, ia&Xiejms 

Data sets in large applications are often too massive to fit completely inside the computers internal 
memory. The resulting input/output communication (or I/O) between fast internal memory and 
slower external memory (such as disks) can be a major performance bottleneck. In this article we 
survey the state of the art in the design and analysis of external memory (or EM) algorithms and data 
structures, where the goal is to exploit locality in order to reduce the I/O costs. We consider a 
varie ... 

Keywords: B-tree, I/O, batched, block, disk, dynamic, extendible hashing, external memory, 
hierarchical memory, multidimensional access methods, multilevel memory, online, out-of-core, 
secondary storage, sorting 

9 Software transactional memory 
Nir Shavit, Dan Touitou 

August 1995 Proceedings of the fourteenth annual ACM symposium on Principles of distributed 
computing 

Full text available: ^patiai£52i. l l3) Additional Information: fu!j.uijate reffiCSClCSS. citings, ins&JLkcms. 



10 Static scheduling algorithms for allocating directed task graphs to multiprocessors 
Yu-Kwong Kwok, Ishfaq Ahmad 

December 1999 ACM Computing Surveys (CSUR), Volume 31 Issue 4 

Full text available: 4^J^£E3wa8.&Si Additional Information: &]LcMtlcn. .aiKSraci JS'^cetaCicSL .ClASS. iSC&X&rms. 



Static scheduling of a program represented by a directed task graph on a multiprocessor system to 
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minimize the program completion time is a well-known problem in parallel processing. Since finding 
an optimal schedule is an NP-complete problem in general, researchers have resorted to devising 
efficient heuristics. A plethora of heuristics have been proposed based on a wide spectrum of 
techniques, including branch-and-bound, integer-programming, searching, graph-theory, 
randomization, genetic ... 

Keywords: DAG, automatic paralielization, multiprocessors, parallel processing, software tools, static 
scheduling, task graphs 



11 Fusion-based register allocation 

Guei-Yuan Lueh, Thomas Gross, Ali-Reza Adl-Tabatabai 

May 2000 ACM Transactions on Programming Languages and Systems (TOPLAS), Volume 22 
Issue 3 

Full text available: ^j^pd f(47 §,45 K3) Additional Information: qtat:nn, atistract, r a t s r e no ss , c fr n gs, ifl C te x tercns, flSY ; ew 

The register allocation phase of a compiler maps live ranges of a program to registers. If there are 
more candidates than there are physical registers, the register allocator must spill a live range (the 
home location is in memory) or split a live range (the live range occupies multiple locations). One of 
the challenges for a register allocator is to deal with spilling and splitting together. Fusion-based 
register allocation uses the structure of the program to make splitting and spilling d ... 

Keywords: performance evaluation, register allocation 



12 A simgjejjite^ 

Peter A. Steenkiste, John L. Hennessy 

January 1989 ACM Transactions on Programming Languages and Systems (TOPLAS), Volume 11 
Issue 1 

Full text available: ^||pitf(2j5£.MB) Additional Information: fLiiLiixalioa. aisusst, caiereflc&s, c&ass, iCLcexiejuiii, awcer 

Register allocation is an important optimization in many compilers, but with per-procedure register 
allocation, it is often not possible to make good use of a large register set. Procedure calls limit the 
improvement from global register allocation, since they force variables allocated to registers to be 
saved and restored. This limitation is more pronounced in LISP programs due to the higher frequency 
of procedure calls. An interprocedural register allocation algorithm is developed by simp ... 

Benjamin Zorn, Dirk Grunwald 

January 1994 ACM Transactions on Modeling and Computer Simulation (TOMACS), Volume 4 Issue 1 

Full text available: pcff(1.69 MR) Additional Information: foil citation, abstract , references, cfcagS. iactex terms 

Because dynamic memory management is an important part of a large class of computer programs, 
high-performance algorithms for dynamic memory management have been and will continue to be of 
considerable interest. The goal of this research is to explore the size and accuracy of synthetic 
models of program allocation behavior. These models, if accurate enough, proved an attractive 
alternative to algorithm evaluation based on trace-driven simulation using actual traces. Based on our 
analysis, w ... 

Keywords: dynamic storage allocation, model evaluation, program behavior modeling, program 
measurement, trace-driven simulation 



Managing memory for readme queries 

Hwee Hwa Pang, Michael J. Carey, Miron Livny 

May 1994 ACM SIGMOD Record , Proceedings of the 1994 ACM SIGMOD international 
conference on Management of data, Volume 23 Issue 2 

Full text available: ■ jjj^j pdf(1 _5S Additional Information: full citatifjf! abstract, fftferancss. tiling s, jrulax terms c 

The demanding performance objectives that real-time database systems (RTDBS) face necessitate 
the use of priority resource scheduling. This paper introduces a Priority Memory Management (PMM) 
algorithm that is designed to schedule queries in RTDBS. PMM attempts to minimize the number of 
missed deadlines by adapting both its multiprogramming level and its memory allocation strategy to 
the characteristics of the offered workload. A series of simulation experiments confirms th ... 

15 Applying priorities to memory allocation 
Sven G. Robertz 

June 2002 ACM SIGPLAN Notices , Proceedings of the 3rd international symposium on Memory 
management, Volume 38 Issue 2 supplement 

Full text available: ^| prif(B18.»4 Km Additional Information: f»lt eiratinn abstract , referent^ inrtex terms 

A novel approach of applying priorities to memory allocation is presented and it is shown how this can 
be used to enhance the robustness of real-time applications. The proposed mechanisms can also be 
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used to increase performance of systems with automatic memory management by limiting the 
amount of garbage collection work.A way of introducing priorities for memory allocation in a Java 
system without making any changes to the syntax of the language is proposed and this has been 
implemented in an e ... 

Keywords: embedded systems, memory allocation, real-time garbage collection, robustness 



Scalable lock-freedynamic memory allocation 
Maged M. Michael 

June 2004 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 2004 conference on 
Programming language design and implementation, Volume 39 Issue 6 

Full text available: ^|jprif(213.S4 <M) Additional Information: mtafion , ahsrrart, refrrflnres, inrtev fsfms 

Dynamic memory allocators (malloc/free) rely on mutual exclusion locks for protecting the 
consistency of their shared data structures under multithreading. The use of locking has many 
disadvantages with respect to performance, availability, robustness, and programming flexibility. A 
lock-free memory allocator guarantees progress regardless of whether some threads are delayed or 
even killed and regardless of scheduling policies. This paper presents a completely lock-free memory 
allocator. It uses ... 

Keywords: async-signal-safe, availability, lock-free, malloc 
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Preston Briggs, Keith D. Cooper, Linda Torczon 

May 1994 ACM Transactions on Programming Languages and Systems (TOPLAS), Volume 16 
Issue 3 

Full text available: ^|^dfi2.&ttJM5) Additional Information: Siii.cjiaBcr:, abstract, faSaceilCsia. J2SH33S. fci&JLi&ans, KBSfiiat 

We describe two improvements to Chaitin-style graph coloring register allocators. The first, optimistic 
coloring, uses a stronger heuristic to find a k-coloring for the interference graph. The second extends 
Chaitin's treatment of rematerialization to handle a larger class of values. These techniques are 
complementary. Optimistic coloring decreases the number of procedures that require spill code and 
reduces the amount of spill code when sp ... 

Keywords: code generation, graph coloring, register allocation 



Efficient register ailocaiion via coloring using clique separators 
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May 1994 ACM Transactions on Programming Languages and Systems (TOPLAS), Volume 16 
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Full text available: ^^pdf(1.15 MB) Additional Information: fuH cftalio* abstract, refemncfes, fc&nfl s , irtriax temre. review 

Although graph coloring is widely recognized as an effective technique for register allocation, memory 
demands can become quite high for large interference graphs that are needed in coloring. In this 
paper we present an algorithm that uses the notion of clique separators to improve the space 
overhead of coloring. The algorithm, based on a result by R. Tarjan regarding the colorability of 
graphs, partitions program code into code segments using clique separators. The interference graphs 
for... 

Keywords: clique separators, graph coloring, interference graph, node priorities, spans, spill code 
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bus architectures and memory allocation 
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September 2004 Proceedings of the 2nd IEEE/ACM/IFIP international conference on 
Hard ware/ software codesign and system synthesis 

Full text available: ^jadf^saAJaS.&a^ Additional Information: jbyffljtfafifia, abstract jeffitancsa. infla^-GCms. 

Separation between computation and communication in system design allows the system designer to 
explore the communication architecture independently of component selection and mapping. In this 
paper we present an iterative two-step exploration methodology for bus-based on-chip 
communication architecture and memory allocation, assuming that memory traces from the 
processing elements are given from the mapping stage. The proposed method uses a static 
performance estimation technique to reduce the ... 

Keywords: communication architecture optimization, design space exploration, memory allocation, 
system-on-a-chip 
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20 A parallel execution model for a database machine with high performances 
Didier Donsez, Pascal Faudemay 

July 1990 Proceedings of the second international symposium on Databases in parallel and 
distributed systems 

Full text available: ^p!jprtf(i.47 Additional Information: wfato, atatrart, Eflamnses. incfcx terms 

In this paper, we present a mixed MIMD / SIMD execution model for a reconfigurable computer. This 
model is adapted to the use of a specialized associative coprocessor, embedded in this host machine. 
A main characteristic of the model is that it uses four types of processes (decoding, calculus, 
coprocessor communication and transaction manager), and that in principle one process of each type 
is allowed on each processor. Time intervals are allocated to operations into partitions oft ... 
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